Use of Extended Phylogenetic Profiles with E-Values and Support Vector Machines for Protein Family Classification

نویسندگان

  • Kishore Narra
  • Li Liao
چکیده

Protein family classification is an important means to assign functions to proteins, and use of phylogenetic profiles, which encode evolutionary history of proteins along with putative homologs, has proved to facilitate protein family classification. We proposed a new approach to compare phylogenetic profiles by incorporating the phylogenetic tree, from which the profiles are derived. Specifically, the profile is extended with new bits corresponding to the internal nodes of the tree, which encode the correlations among the bits in the original profiles. Such extension allows for direct use of E-Values, instead of imposing an ad hoc cut-off to derive binary profiles, which are commonly used in previous methods. A scoring scheme is adopted for measuring the similarity among these extended profiles, and the scores thus obtained are then provided to a classifier -a support vector machine using a polynomial kernel function -for classification. The method has been tested on the proteome of Saccharomyces cerevisiae, the budding yeast and outperformed a similar method that uses phylogenetic tree information as a tree kernel.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Remote Sensing and Land Use Extraction for Kernel Functions Analysis by Support Vector Machines with ASTER Multispectral Imagery

Land use is being considered as an element in determining land change studies, environmental planning and natural resource applications. The Earth’s surface Study by remote sensing has many benefits such as, continuous acquisition of data, broad regional coverage, cost effective data, map accurate data, and large archives of historical data. To study land use / cover, remote sensing as an effic...

متن کامل

Face Recognition using Eigenfaces , PCA and Supprot Vector Machines

This paper is based on a combination of the principal component analysis (PCA), eigenface and support vector machines. Using N-fold method and with respect to the value of N, any person’s face images are divided into two sections. As a result, vectors of training features and test features are obtain ed. Classification precision and accuracy was examined with three different types of kernel and...

متن کامل

Using extended phylogenetc profiles and support vector machines for protein family classification

We proposed a new approach to compare profiles when the correlations among attributes can be represented as a tree. To account for these correlations, the profile is extended with new bits corresponding to the internal nodes of the tree, which encode the correlations. An ad hoc scoring scheme is adopted for measuring the similarity among these extended profiles, and the scores thus obtained are...

متن کامل

Transductive learning with EM algorithm to classify proteins based on phylogenetic profiles

We proposed a novel method for protein classification based on phylogenetic profiles. Each protein's profile was extended with extra bits encoding the phylogenetic tree structure and the likelihood, in the form of weights on profile indices, of the protein's functional family membership in each of the reference genomes. The extended profiles were then integrated as part of a kernel of a support...

متن کامل

A comparative study of performance of K-nearest neighbors and support vector machines for classification of groundwater

The aim of this work is to examine the feasibilities of the support vector machines (SVMs) and K-nearest neighbor (K-NN) classifier methods for the classification of an aquifer in the Khuzestan Province, Iran. For this purpose, 17 groundwater quality variables including EC, TDS, turbidity, pH, total hardness, Ca, Mg, total alkalinity, sulfate, nitrate, nitrite, fluoride, phosphate, Fe, Mn, Cu, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005